Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
Cell Genom ; 4(4): 100527, 2024 Apr 10.
Artículo en Inglés | MEDLINE | ID: mdl-38537634

RESUMEN

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.


Asunto(s)
Genoma , Genómica , Ratas , Animales , Genoma/genética , Anotación de Secuencia Molecular , Secuenciación Completa del Genoma , Variación Genética/genética
2.
bioRxiv ; 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38260597

RESUMEN

The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in a vast dataset of quantitative molecular and physiological phenotypes. We built a pangenome graph from 10x Genomics Linked-Read data for 31 recombinant inbred rats to study genetic variation and association mapping. The pangenome includes 0.2Gb of sequence that is not present the reference mRatBN7.2, confirming the capture of substantial additional variation. We validated variants in challenging regions, including complex structural variants resolving into multiple haplotypes. Phenome-wide association analysis of validated SNPs uncovered variants associated with glucose/insulin levels and hippocampal gene expression. We propose an interaction between Pirl1l1, chromogranin expression, TNF-α levels, and insulin regulation. This study demonstrates the utility of linked-read pangenomes for comprehensive variant detection and mapping phenotypic diversity in a widely used rat genetic reference panel.

3.
bioRxiv ; 2023 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-37790531

RESUMEN

Motivation: The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. Results: In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. Availability: We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.

4.
Front Genet ; 14: 1225248, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37636268

RESUMEN

Whole genome sequencing has revolutionized infectious disease surveillance for tracking and monitoring the spread and evolution of pathogens. However, using a linear reference genome for genomic analyses may introduce biases, especially when studies are conducted on highly variable bacterial genomes of the same species. Pangenome graphs provide an efficient model for representing and analyzing multiple genomes and their variants as a graph structure that includes all types of variations. In this study, we present a practical bioinformatics pipeline that employs the PanGenome Graph Builder and the Variation Graph toolkit to build pangenomes from assembled genomes, align whole genome sequencing data and call variants against a graph reference. The pangenome graph enables the identification of structural variants, rearrangements, and small variants (e.g., single nucleotide polymorphisms and insertions/deletions) simultaneously. We demonstrate that using a pangenome graph, instead of a single linear reference genome, improves mapping rates and variant calling for both simulated and real datasets of the pathogen Neisseria meningitidis. Overall, pangenome graphs offer a promising approach for comparative genomics and comprehensive genetic variation analysis in infectious disease. Moreover, this innovative pipeline, leveraging pangenome graphs, can bridge variant analysis, genome assembly, population genetics, and evolutionary biology, expanding the reach of genomic understanding and applications.

5.
Front Toxicol ; 5: 1162749, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37389175

RESUMEN

Of the nearly 1 million military personnel who participated in the 1990-1991 Gulf War, between 25% and 35% became ill with what now is referred to as Gulf War Illness (GWI) by the Department of Defense. Symptoms varied from gastrointestinal distress to lethargy, memory loss, inability to concentrate, depression, respiratory, and reproductive problems. The symptoms have persisted for 30 years in those afflicted but the basis of the illness remains largely unknown. Nerve agents and other chemical exposures in the war zone have been implicated but the long-term effects of these acute exposures have left few if any identifiable signatures. The major aim of this study is to elucidate the possible genomic basis for the persistence of symptoms, especially of the neurological and behavioral effects. To address this, we performed a whole genome epigenetic analysis of the proposed cause of GWI, viz., exposure to organophosphate neurotoxicants combined with high circulating glucocorticoids in two inbred mouse strains, C57BL/6J and DBA/2J. The animals received corticosterone in their drinking water for 7 days followed by injection of diisopropylfluorophosphate, a nerve agent surrogate. Six weeks after DFP injection, the animals were euthanized and medial prefrontal cortex harvested for genome-wide DNA methylation analysis using high-throughput sequencing. We observed 67 differentially methylated genes, notably among them, Ttll7, Akr1c14, Slc44a4, and Rusc2, all related to different symptoms of GWI. Our results support proof of principle of genetic differences in the chronic effects of GWI-related exposures and may reveal why the disease has persisted in many of the now aging Gulf War veterans.

6.
Nature ; 617(7960): 312-324, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37165242

RESUMEN

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Asunto(s)
Genoma Humano , Genómica , Humanos , Diploidia , Genoma Humano/genética , Haplotipos/genética , Análisis de Secuencia de ADN , Genómica/normas , Estándares de Referencia , Estudios de Cohortes , Alelos , Variación Genética
7.
bioRxiv ; 2023 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-37214860

RESUMEN

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared to its predecessor. Gene annotations are now more complete, significantly improving the mapping precision of genomic, transcriptomic, and proteomics data sets. We jointly analyzed 163 short-read whole genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ~20.0 million sequence variations, of which 18.7 thousand are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.

8.
bioRxiv ; 2023 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37066137

RESUMEN

Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.

9.
New Phytol ; 237(6): 2360-2374, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36457296

RESUMEN

To establish persistent infections in host plants, herbivorous invaders, such as root-knot nematodes, must rely on effectors for suppressing damage-induced jasmonate-dependent host defenses. However, at present, the effector mechanisms targeting the biosynthesis of biologically active jasmonates to avoid adverse host responses are unknown. Using yeast two-hybrid, in planta co-immunoprecipitation, and mutant analyses, we identified 12-oxophytodienoate reductase 2 (OPR2) as an important host target of the stylet-secreted effector MiMSP32 of the root-knot nematode Meloidogyne incognita. MiMSP32 has no informative sequence similarities with other functionally annotated genes but was selected for the discovery of novel effector mechanisms based on evidence of positive, diversifying selection. OPR2 catalyzes the conversion of a derivative of 12-oxophytodienoate to jasmonic acid (JA) and operates parallel to 12-oxophytodienoate reductase 3 (OPR3), which controls the main pathway in the biosynthesis of jasmonates. We show that MiMSP32 targets OPR2 to promote parasitism of M. incognita in host plants independent of OPR3-mediated JA biosynthesis. Artificially manipulating the conversion of the 12-oxophytodienoate by OPRs increases susceptibility to multiple unrelated plant invaders. Our study is the first to shed light on a novel effector mechanism targeting this process to regulate the susceptibility of host plants.


Asunto(s)
Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH , Tylenchoidea , Animales , Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH/metabolismo , Oxidorreductasas/metabolismo , Transporte Biológico , Tylenchoidea/fisiología , Enfermedades de las Plantas
10.
Bioinformatics ; 38(13): 3319-3326, 2022 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-35552372

RESUMEN

MOTIVATION: Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way. RESULTS: We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs. AVAILABILITY AND IMPLEMENTATION: ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Programas Informáticos , Genómica , Algoritmos , Documentación
11.
PLoS Comput Biol ; 18(5): e1009123, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35639788

RESUMEN

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.


Asunto(s)
Ecosistema , Variación Genética , Biología Computacional , Variación Genética/genética , Nucleótidos , Programas Informáticos
12.
G3 (Bethesda) ; 12(5)2022 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-35285473

RESUMEN

Interpreting and integrating results from omics studies typically requires a comprehensive and time consuming survey of extant literature. GeneCup is a literature mining web service that retrieves sentences containing user-provided gene symbols and keywords from PubMed abstracts. The keywords are organized into an ontology and can be extended to include results from human genome-wide association studies. We provide a drug addiction keyword ontology that contains over 300 keywords as an example. The literature search is conducted by querying the PubMed server using a programming interface, which is followed by retrieving abstracts from a local copy of the PubMed archive. The main results presented to the user are sentences where gene symbol and keywords co-occur. These sentences are presented through an interactive graphical interface or as tables. All results are linked to the original abstract in PubMed. In addition, a convolutional neural network is employed to distinguish sentences describing systemic stress from those describing cellular stress. The automated and comprehensive search strategy provided by GeneCup facilitates the integration of new discoveries from omic studies with existing literature. GeneCup is free and open source software. The source code of GeneCup and the link to a running instance is available at https://github.com/hakangunturkun/GeneCup.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Humanos , Internet , PubMed , Interfaz Usuario-Computador
13.
Leuk Res ; 114: 106804, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35182904

RESUMEN

Leukemia is a group of malignancies of the blood forming tissues, and is characterized by the uncontrolled proliferation of blood cells. In the United States, it accounts for approximately 3.5% and 4% of all cancer-related incidences and mortalities, respectively. The current study aimed to explore the role of Bcl2 and associated genes in leukemia pathogenesis using a systems genetics approach. The transcriptome data from BXD Recombinant Inbred (RI) mice was analyzed to identify the expression of Bcl2 in myeloid cells. eQTL mapping was performed to select the potential chromosomal region and subsequently identify the candidate gene modulating the expression of Bcl2. Furthermore, gene enrichment and protein-protein interaction (PPI) analyses of the Bcl2-coexpressed genes were performed to demonstrate the role of Bcl2 in leukemia pathogenesis. The Bcl2-coexpressed genes were found to be enriched in various hematopoietic system related functions, and multiple pathways related to signaling, immune response, and cancer. The PPI network analysis demonstrated direct interaction of hematopoietic function related genes, such as Bag3, Bak1, Bcl2l11, Bmf, Mapk9, Myc, Ppp2r5c, and Ppp3ca with Bcl2. The eQTL mapping identified a 4.5 Mb genomic region on chromosome 11, potentially regulating the expression of Bcl2. A multi-criteria filtering process identified Top2a, among the genes located in the mapped locus, as the best candidate upstream regulator for Bcl2 expression variation. Hence, the current study provides better insights into the role of Bcl2 in leukemia pathogenesis and demonstrates the significance of our approach in gaining new knowledge on leukemia. Furthermore, our findings from the PPI network analysis and eQTL mapping provide supporting evidence of leukemia-associated genes, which can be further explored for their functional importance in leukemia. DATA AVAILABILITY: The myeloid cell transcriptomic data of the BXD mice used in this study can be accessed through our GeneNetwork (http://www.genenetwork.org) with the accession number of GN144.


Asunto(s)
Genómica , Leucemia , Proteínas Adaptadoras Transductoras de Señales , Animales , Proteínas Reguladoras de la Apoptosis , Humanos , Leucemia/genética , Ratones , Fenotipo , Proteínas Proto-Oncogénicas c-bcl-2/genética
14.
G3 (Bethesda) ; 11(12)2021 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-34499130

RESUMEN

The BXD family of mouse strains are an important reference population for systems biology and genetics that have been fully sequenced and deeply phenotyped. To facilitate interactive use of genotype-phenotype relations using many massive omics data sets for this and other segregating populations, we have developed new algorithms and code that enable near-real-time whole-genome quantitative trait locus (QTL) scans for up to one million traits. By using easily parallelizable operations including matrix multiplication, vectorized operations, and element-wise operations, our method is more than 700 times faster than a R/qtl linear model genome scan using 16 threads. We used parallelization of different CPU threads as well as GPUs. We found that the speed advantage of GPUs is dependent on problem size and shape (the number of cases, number of genotypes, and number of traits). Our approach is ideal for interactive web services, such as GeneNetwork.org that need to display results in real-time. Our implementation is available as the Julia language package LiteQTL at https://github.com/senresearch/LiteQTL.jl.


Asunto(s)
Algoritmos , Programas Informáticos , Animales , Genotipo , Ratones , Fenotipo , Sitios de Carácter Cuantitativo
15.
Front Genet ; 12: 659012, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34367237

RESUMEN

Cannabinoid receptor 1 activation by the major psychoactive component in cannabis, Δ9-tetrahydrocannabinol (THC), produces motor impairments, hypothermia, and analgesia upon acute exposure. In previous work, we demonstrated significant sex and strain differences in acute responses to THC following administration of a single dose (10 mg/kg, i.p.) in C57BL/6J (B6) and DBA/2J (D2) inbred mice. To determine the extent to which these differences are heritable, we quantified acute responses to a single dose of THC (10 mg/kg, i.p.) in males and females from 20 members of the BXD family of inbred strains derived by crossing and inbreeding B6 and D2 mice. Acute THC responses (initial sensitivity) were quantified as changes from baseline for: 1. spontaneous activity in the open field (mobility), 2. body temperature (hypothermia), and 3. tail withdrawal latency to a thermal stimulus (antinociception). Initial sensitivity to the immobilizing, hypothermic, and antinociceptive effects of THC varied substantially across the BXD family. Heritability was highest for mobility and hypothermia traits, indicating that segregating genetic variants modulate initial sensitivity to THC. We identified genomic loci and candidate genes, including Ndufs2, Scp2, Rps6kb1 or P70S6K, Pde4d, and Pten, that may control variation in THC initial sensitivity. We also detected strong correlations between initial responses to THC and legacy phenotypes related to intake or response to other drugs of abuse (cocaine, ethanol, and morphine). Our study demonstrates the feasibility of mapping genes and variants modulating THC responses in the BXDs to systematically define biological processes and liabilities associated with drug use and abuse.

16.
Genes Brain Behav ; : e12738, 2021 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-33893716

RESUMEN

The National Institute on Drug Abuse and Joint Institute for Biological Sciences at the Oak Ridge National Laboratory hosted a meeting attended by a diverse group of scientists with expertise in substance use disorders (SUDs), computational biology, and FAIR (Findability, Accessibility, Interoperability, and Reusability) data sharing. The meeting's objective was to discuss and evaluate better strategies to integrate genetic, epigenetic, and 'omics data across human and model organisms to achieve deeper mechanistic insight into SUDs. Specific topics were to (a) evaluate the current state of substance use genetics and genomics research and fundamental gaps, (b) identify opportunities and challenges of integration and sharing across species and data types, (c) identify current tools and resources for integration of genetic, epigenetic, and phenotypic data, (d) discuss steps and impediment related to data integration, and (e) outline future steps to support more effective collaboration-particularly between animal model research communities and human genetics and clinical research teams. This review summarizes key facets of this catalytic discussion with a focus on new opportunities and gaps in resources and knowledge on SUDs.

17.
Cell Syst ; 12(3): 235-247.e9, 2021 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-33472028

RESUMEN

The challenge of precision medicine is to model complex interactions among DNA variants, phenotypes, development, environments, and treatments. We address this challenge by expanding the BXD family of mice to 140 fully isogenic strains, creating a uniquely powerful model for precision medicine. This family segregates for 6 million common DNA variants-a level that exceeds many human populations. Because each member can be replicated, heritable traits can be mapped with high power and precision. Current BXD phenomes are unsurpassed in coverage and include much omics data and thousands of quantitative traits. BXDs can be extended by a single-generation cross to as many as 19,460 isogenic F1 progeny, and this extended BXD family is an effective platform for testing causal modeling and for predictive validation. BXDs are a unique core resource for the field of experimental precision medicine.


Asunto(s)
Medicina de Precisión , Animales , Modelos Animales de Enfermedad , Ratones
18.
J Neurosci ; 41(5): 927-936, 2021 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-33472826

RESUMEN

High digital connectivity and a focus on reproducibility are contributing to an open science revolution in neuroscience. Repositories and platforms have emerged across the whole spectrum of subdisciplines, paving the way for a paradigm shift in the way we share, analyze, and reuse vast amounts of data collected across many laboratories. Here, we describe how open access web-based tools are changing the landscape and culture of neuroscience, highlighting six free resources that span subdisciplines from behavior to whole-brain mapping, circuits, neurons, and gene variants.


Asunto(s)
Acceso a la Información , Encéfalo/fisiología , Internet/tendencias , Red Nerviosa/fisiología , Neuronas/fisiología , Animales , Encéfalo/citología , Conjuntos de Datos como Asunto/tendencias , Redes Reguladoras de Genes/fisiología , Humanos , Red Nerviosa/citología
20.
Genetics ; 215(2): 359-372, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32327562

RESUMEN

Sharing human genotype and phenotype data is essential to discover otherwise inaccessible genetic associations, but is a challenge because of privacy concerns. Here, we present a method of homomorphic encryption that obscures individuals' genotypes and phenotypes, and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are analytically interchangeable. The encryption uses a high-dimensional random linear orthogonal transformation key that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and can be replaced by quantiles from a Gaussian with negligible effects on accuracy. Likelihood-based inferences are unaffected by orthogonal encryption. These include linear mixed models to control for unequal relatedness between individuals, heritability estimation, and including covariates when testing association. Orthogonal transformations can be applied in a modular fashion for multiparty federated mega-analyses where the parties first agree to share a common set of genotype sites and covariates prior to encryption. Each then privately encrypts and shares their own ciphertext, and analyses all parties' ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise-reduction attacks. We present the method as a challenge to the community to determine its security.


Asunto(s)
Algoritmos , Trastorno Depresivo Mayor/genética , Genoma Humano , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple , Privacidad , Animales , Seguridad Computacional , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Ratones
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...